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Abstract — A  Classification  system  such  as  an  Automatic  Target 
Recognition  (ATR)  system  with  N  possible  output  labels  (or  deci¬ 
sions)  will  have  N(N-l)  possible  errors.  The  Receiver  Operating 
Characteristic  (ROC)  manifold  was  created  to  quantify  all  of 
these  errors.  Truthed  data  will  produce  an  approximation  to  a 
ROC  manifold.  How  well  does  the  approximate  ROC  manifold 
approximate  the  true  ROC  manifold?  Several  functionals  exist 
that  quantify  the  approximation  ability,  but  researchers  really 
wish  to  quantify  the  performance  in  the  approximate  ROC  man¬ 
ifold.  This  paper  will  review  different  performance  definitions 
for  ROC  curves  and  manifolds,  and  thus,  quantify  the  fusion  of 
ATR  systems.  Examples  of  different  performances  will  be  given 
that  are  defined  on  manifolds. 

Keywords:  Performance,  Evaluation,  Classification  Sys¬ 
tem,  ROC  Manifold,  functional 

I.  Introduction 

Given  a  classification  system,  how  does  one  quantify  its 
performance?  What  do  we  mean  by  the  system’s  performance? 
Is  there  a  best  performance  quantifier  to  use?  This  paper  will 
discuss  many  aspects  of  performances  of  a  system  and  a  family 
of  systems,  and  as  a  consequence,  will  define  the  performance 
of  the  fusion  of  systems.  Examples  of  different  performances 
will  be  given  in  the  Examples  section. 

II.  Mathematical  Background 

This  section  gives  the  essential  theory  and  notation  in  order 
to  discuss  the  different  performances  used  to  evaluate  the 
fusion  of  ATR  systems,  and  classification  systems. 

A.  Classification  Theory 

Let  £  be  a  population  set  of  outcomes.  These  outcomes  can 
be  real-life  “events”1.  An  event  could  be  a  fixed-time  event,  a 
space-time  event,  or  a  space-time-spectral  event,  to  name  a  few 
examples.  Let  (£  be  a  a- algebra  of  subsets  of  £,  then  (£,  l£) 
is  a  measurable  space  [1],  Let  P  be  a  probability  measure 
defined  on  £,  then  (£.  (£,  P)  is  a  probability  measure  space. 
Let  s  be  a  sensor  that  senses  an  event  (i.e.,  an  outcome)  and 
produces  (raw)  datum  as  its  output,  i.e.,  s  :  £  — >  V,  where 
V  is  a  (raw)  data  set.  This  data  set  may  be  too  difficult  to 
quantify  directly  as  it  may  be  a  collection  or  series  of  images 

1  In  probability  theory  an  event  is  a  set  of  outcomes.  Here  we  use  it  in  the 
informal  sense. 


or  sequences  of  audio  signals,  for  instance.  Thus,  a  mapping 
p  defined  on  V  produces  an  object  called  a  feature  that  is  a 
more  refined  datum,  typically  a  vector  of  real  numbers.  The 
mapping  p  then  is  a  processor  that  takes  a  (raw)  datum  and 
produces  a  refined  datum  vector,  i.e.,  p  :  V  — >  T .  Typically, 
T  is  some  finite  dimensional  space  but  need  not  be  finite  nor 
a  linear  space.  Let  a  be  a  classifier  mapping  T  into  a  label 
set  C.  That  is,  a  :  T  — *  £.  Example  of  a  2-label  set  is  C  = 
{target,  non-target}.  Our  interest  for  this  paper  is  a  label  set 
with  N  labels,  say  £  =  {£1,^2, ^3,  ■  ■  ■  ,£n}-  The  composition 
of  these  mappings  yields  a  classification  system  A  =  aopos. 
The  graphical  representation  of  these  mappings  is  given  in  the 
following  diagram. 

The  diagram  for  the  system  is  written  as 


Since  C  =  {^1  ,^2,^3,  •  •  • ,  £n}  is  finite  then  the  power  set  of 
£  ,  denoted  by  £,  is  the  smallest  er-field  of  subsets  of  £.  Now 
define  the  collection  of  all  measurable  systems  [1]  mapping  £ 
into  £,  by 

y  =  {A  \  £  -^  £  \  A  is  measurable  }. 

Let  O  be  a  set  of  parameters  that  might  be  a  multi¬ 
dimensional  vector  of  parameters.  For  each  6  £  ©  let  a g  be  an 
classifier  mapping  T  into  the  label  set  £.  That  is,  &g  :  T  — >  £ 
for  each  6  £  0.  The  composition  of  these  mappings  yields  a 
classification  system  Ag  e  a#  o  p  o  s.  We  define  the  family 
of  the  classification  systems,  or  for  brevity,  the  classification 
system  family  (CSF),  to  be  A  =  {Ag  :  6  €  0}.  Thus,  A 
is  a  subset  of  y.  We  define  the  collection  of  families  of 
classification  systems  to  be 

=  {A  CSC  :  A  is  nonempty}. 

B.  Two  Classification  Systems 

Consider  the  case  when  two  sensors.  Si  and  S2,  observe 
events  occurring  in  the  same  population  set  £.  Assume  they 
produce  data  in  the  data  sets  T>\  and  T> 2,  respectively.  Further, 
assume  each  sensor  has  its  own  processor,  pi  and  P2,  which 
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maps  datum  in  V i  to  features  in  T\  and  V>  to  features  in 
T-2,  respectively.  In  particular,  assume  p i  :  T>\  —>  T\  and 
P2  :  T>2  — >  T‘2-  Suppose  there  is  a  family  of  classifiers  for  pi 
and  Si  given  by  {a g  :  9  £  0}  and  another  family  of  classifiers 
{ b,i  :</>€$}  for  P2  and  S2,  outputting  labels  in  the  label  set 
C.  Thus,  ag  :  T\  — »  C  for  each  9  £  0  and  — >  C 

for  each  f  £  <i>.  The  composition  of  these  mappings  yield 
classification  systems  represented  by  the  diagram. 


diagram  for  label  fusion  for  two  systems  is 


Now  define  the  system  A g  =  ag  o  px  o  si  for  each  9  £  Q 
and  B,p  =  o  p2  o  s2  for  each  <f>  £  <1»,  and  denote  the 
two  classification  system  families  A  =  {Ag  :  9  £  0}  and 
B  =  {B^,  :  <fi  £  <!>}. 

The  two  classification  systems  developed  above  map  out¬ 
comes  from  the  population  set  into  different  data,  feature, 
and  label  sets,  which  are  then  used  to  fuse  the  classification 
systems  together.  There  are,  however,  other  ways  to  label  the 
outcomes  from  the  event  set.  In  this  discussion,  classification 
systems  can  map  outcomes  into  either  the  same  or  different 
data  sets  or  the  same  or  different  feature  sets.  The  sets 
which  must  remain  the  same  for  the  mathematical  development 
contained  herein  are  the  event  set  £  and  the  two-class  label  set 
C.  Therefore,  the  classification  systems  must  be  acting  from 
the  same  event  set,  map  into  either  the  same  or  different  data 
and  feature  sets  and  eventually  map  into  the  same  label  set. 
That  is, 


no  matter  which  type  of  fusion  it  is.  Given  two  CSFs  A  and 
B  and  a  fusion  rule  91,  then  a  new  family  C  is  produced  and 
defined  by 

C  =  9f(A,B)  =  {9t(Ae,B*)  :i 9  e  0,  e  <f>}. 

D.  Receiver  Operating  Characteristic  (ROC)  Curves 

For  a  2-class  label  set  C  =  { t,n },  (t  denotes  target  and 
n  denote  nontarget)  the  errors  are  false  positive  (type  I 
error,  a)  and  false  negative  (type  II  error,  jj).  Let  Ppp(Ag) 
denote  the  probability  that  the  classification  system  Ag  labels 
an  event  as  a  target  label,  t,  given  that  the  event  is  really  a 
non-target  event.  Let  Pfn{ Ag)  denote  the  probability  of  false 
negative  classification  by  the  system  Ag,  then  Ppjy(Ag)  is  the 
probability  that  the  classification  system  Ag  labels  an  event  as 
a  non-target  label,  n,  given  that  the  outcome  is  really  a  target 
event.  The  ROC  curve  is  the  graph  of  the  ROC  function. 

Definition  1:  (ROC  function,  ROC  curve)  Let  A  =  {A#  : 
9  £  0}  be  a  family  of  classification  systems  defined  on  the 
probability  space  (£,  (£,  P)  mapping  to  the  label  set  C  =  {t,  n} 
with  parameter  set  0.  For  each  p  £  [0, 1]  ,  define  the  set 

0P  =  {9  £  0  :  Ppp(Ag)  <  p}. 

For  p  £  [0,1],  if  0p  is  nonempty  then  define 

fA(p)  =  max{PTF(Ae)  :  9  £  ©p}.  (1) 

If  0p  is  empty  then  /,.*  (p)  is  not  defined.  The  function  /A  is 
called  the  ROC  function.  The  graph  of  fA  is  called  the  ROC 
curve. 

Since  every  classification  system  family  will  have  a  ROC 
curve  (determined  by  the  parameter  set),  then  there  is  a 
mapping  F  that  take  a  CSF  A  and  produces  its  ROC  curve 
/a-  That  is,  F( A)  =  /A. 


C.  Fusion  Rules 

There  are  two  types  of  fusion  for  classification  systems.  The 
first  type  allows  for  the  families  of  classification  systems  which 
are  to  be  fused  to  have  exactly  the  same  label  set.  We  mean 
exactly  the  same,  and  not  isomorphic,  so  that  if  the  label  set 
is,  in  fact,  C  =  {target,  non-target}  for  each  family,  then  this 
means  that  the  actual  definition  of  a  target  label  is  identical  for 
each.  This  allows  for  each  family  to  partition  the  population 
set  in  the  same  way.  This  type  of  information  fusion  we  call 
within  fusion;  the  other  type  is  called  across  fusion  [2],  The 


E.  ROC  Manifolds 

Assume  the  label  set  C  =  {(}, f2, . . . ,  £n}  where  n  >  2, 
and  the  classification  system  A  :  £  — »  C  is  designed  to  map 
the  outcomes  in  the  event  set  £,  C  £  to  for  each  i  =  1, ...,  n. 
Define  the  probability  of  true  positive  classification  for  a  given 
label  £i  of  the  classification  system  A  by  the  conditional 
probability 


Pi\i{A)  =  Pr{A(e)  =  £;  \  e  £  £f\ 


Pr  (A*({ £i})n£i) 
Pr  (£i) 
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The  probability  that  system  A  classifies  an  outcome  as  label  Definition  2:  (Error  set)  Given  a  classification  system  fam- 
ti  when  the  outcome  is  truly  classified  as  label  ij,  is  ily  A  define  its  error  set  EA  to  be 


-Pi|j(A)  =  Pr{A(e)  =  £i  \  e  G  £j} 


Pr  (A*  ({*<})  n£,-) 


Pr(^) 


(2) 


We  use  the  notation  Pi\j( A)  to  convene  the  fact  that  Pqj  is 
a  real-valued  function  with  the  system  A  as  its  input.  The 
conjunctive  equations  of  the  system  are 


n 

^2  pi\j  (A)  =  1  for  each  3  =  1)  2, . . . ,  n  (3) 

»= 1 


and  are  true  for  every  system  A  :  £  — >  C  [3],  [4],  Only  the 
i\i  terms  are  correct  classifications,  the  other  n  —  1  terms  are 
the  errors  of  system  A  and,  consequently,  from  equations  (3) 
we  have 


X!  pi\j(A)  =  1~Pj\j(A)  for  each  j  =  1,2, ,  n.  (4) 
*=M  At 


For  system  A  define  the  nxn  matrix  P(A)  to  be  the  matrix 
whose  i,j  entry  is  the  value  Pju(A)  for  every  i,j  G  {1, ?r}, 
that  is. 


p(A)ij  =  Pi\j(A) 


Pr(AH{  i_i})n£j) 
Pr(^) 


Notice  that  the  diagonal  entries  of  the  matrix  P(A)  are  the 
correct  classifications  and  the  off-diagonal  entries  are  the 
errors  associated  with  misclassification.  By  property  (3)  the 
transposed  matrix  P(A)T  is  a  stochastic  matrix.  Also,  all  the 
entries  of  this  matrix  have  values  lying  in  the  interval  [0,1]. 
Let  Mn  denote  the  set  of  nxn  matrices  whose  entries  lie  in 
[0, 1],  that  is. 


Mn  =  {M=  (Mjj)  :  Vkj  G  [0,1]  for  every  i,j  G  {1,2,  ,..,n}} 

then  P(A)  G  M n .  Define  the  matrix  J  G  Mn  by 

'0  1  1  •••  1  ' 

1  0  1  •••  1 

J  =  1  1  0  1  . 

1  1  1  •••  0 

Matrix  J  will  be  used  to  remove  the  correct  classifications 
and  keep  only  the  errors  of  the  system.  Specifically,  let  P(A) 
denote  the  n  x  n  matrix  given  by  the  Hadamard  product  with 
J 

P(A)  =  J©P(A). 


Let  Zn  denote  the  set  of  matrices  in  M„  with  zero  diagonal 
entries  and  off-diagonal  entries  are  real  numbers  between  0 
and  1,  that  is. 


Zn  =  {MG  Mn  :  M iti  =  0  for  all  i  =  1,2,  ...,n}. 


EA  =  (P(A)  :  A  G  A}. 

Observe  that  EA  C  Zn  since  the  diagonal  entries  are 
always  zero.  This  set  is  comprised  of  many  points  (matrices), 
however,  we  seek  those  ’’closest”  to  the  origin  (zero  matrix) 
as  we  define  in  the  following  ROC  function. 

Definition  3:  (ROC  function)  Given  a  classification  system 
family  A  define  its  ROC  function  Ta  to  be,  for  every  P  G  Zn 

Ta(P) 

{smallest  a  >  0  when  q-P  G  Ea 

oo  when  q-P  ^  EA  for  all  a  >  0 

=  min  |a  G  [0,  oo]  :  Q:P  G  Aa|  • 

Therefore,  Ta  :  Zn  — >  [0,  oo]  . 

The  idea  of  this  definition  comes  from  Minkowski’s  func¬ 
tional  (see  [5]  for  its  use  in  optimization.)  An  equivalent 
definition  of  Ta,  useful  for  computations,  is  given  in  the 
following  theorem. 

Theorem  1:  Given  a  classification  system  family  A  with  n 
labels,  for  every  P  G  Zn  ,  with  P  /  0 


Ta(P)  = 

i  .  r 

am  < 


Q 


:  Q  G 


and 


} 


and  Ta(P)  =  oo  otherwise. 

The  ROC  function  allows  us  to  define  the  frontier  of  the 
error  set,  which  we  call  the  ROC  frontier.  The  ROC  frontier 
will  be  the  ROC  manifold  if  it  satisfies  the  manifold  criterion. 

Definition  4:  (Manifold)  [6]  A  manifold  is  a  topological 
space  that  is  locally  Euclidean,  that  is,  around  every  point 
in  the  space,  there  is  a  neighborhood  that  is  topologically  the 
same  as  the  open  unit  ball  in  Rm  for  some  positive  integer 
m. 

Definition  5:  (ROC  frontier,  ROC  manifold)  Given  a  classi¬ 
fication  system  family  A  with  n  labels,  define  its  ROC  frontier 
Ma  to  be  the  set 


Ma  =  {PG  Zn  :  Ta(P)  =  1}. 


If  this  set  is  a  manifold,  then  MA  is  called  the  ROC  manifold 
for  A. 

For  A  =  {Ag  :  9  G  0}  we  assume  the  parameter  set,  0, 
is  homeomorphic  to  ]Rm  for  some  positive  integer  m  with  the 
usual  Euclidean  topology.  Consequently,  the  error  set,  EA, 
is  also  homeomorphic  to  some  finite  dimensional  space,  and 
thus,  Ma  will  be  a  manifold. 

Let  Sf  denote  the  collection  of  ROC  curves  in  =  2)  or 
manifolds  ( n  >  2).  That  is. 


Now  we  define  the  error  set  of  the  classification  system 
family. 


=  {/a  :  A  is  a  CSF  defined  on  £}  for  n  =  2 
=  {Ma  :  A  is  a  CSF  defined  on  £}  for  n  >  2 
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III.  Performances 

We  choose  a  real-valued  functional  p  that  takes  a  system  A 
as  its  input  and  yields  a  positive  real  number  as  its  output.  We 
call  p(  A)  the  performance  of  A,  and  p  is  called  a  performance 
functional.  Without  loss  of  generality,  we  assume  that  a  larger 
value  of  p( A)  is  better  performance.  Consequently,  given  two 
systems  A  and  B,  if 

P( A)  <  p( B) 

then  we  say  B  is  better  than  A  with  respect  to  p.  This  will 

p 

induce  a  partial  ordering  on  systems  in  y,  and  hence,  we 
write 

p 

A  ^  B. 

The  system  performance  functional  p  (^-functional  for 
brevity)  induces  a  family  performance  functional  g  (.F- 
functional  for  brevity)  on  a  classification  system  family  A  by 
the  following  definition 

o(A)  =  maxp(A)  =  maxpfAA. 
aga  PgG 

Problem  1:  Given  a  performance  ^-functional  g  and  given 
a  set  of  label  fusion  rules  LABrules  we  seek  the  best  fusion 
rule  91*  £  LABrules  such  that  the  performance 

p(9T(A,B))  >  0(<K(A,B)) 

for  all  choices  91  £  LABrules.  That  is, 

e(9T(A,B))=  max  g(Dt(A,B)). 

9TG  LABrules 

The  optimal  classification  system  family  will  be  C*  = 
91*  (A,  B)  and  the  optimal  fusion  rule  91*  indicates  how  the 
two  families  will  be  fused.  But,  have  we  done  fusion  here?  It 
depends  on  the  performance.  If 

p(9i*(A,B))  >  max{p(A),  p(B)} 

then  C*  =  91*  (A,  B)  is  THE  optimal  classification  system 
family  (with  respect  to  g.) 

Definition  6:  (Dual  Set)  Define  the  collection  ■'/'*  of  real¬ 
valued  functionals  p  defined  on  systems  mapping  outcomes 
from  the  measurable  space  (£,  (£)  into  the  label  set  £  to  be 

y*  =  {P-.y-^  R}. 

We  call  ■’:/*  the  dual 2  set  of  y .  We  are  interested  in 
nonnegative  functionals  so  we  restrict  this  set  further. 

Definition  7:  (Nonnegative  Dual  Set).  Define  the  collection 
y*+  to  be  the  nonnegative,  real-valued  functionals  p  defined 
on  systems  mapping  outcomes  from  the  measurable  space 
(£,  (S)  into  the  label  set  £  to  be 

j^*+  =  {p  :  y  -»  R+} 

where  R+  =  {r  £  R  :  r  >  0}. 

2We  take  care  in  using  the  word  "dual  set”  here  not  to  be  confused  with  the 
“dual  space”  as  founded  in  functional  analysis.  A  dual  space  is  a  linear  space 
of  linear  functionals  over  the  field  in  use.  We  do  not  assume  our  functionals 
are  linear  since  we  do  assume  the  sets  have  algebraic  structure  that  make 
them  linear  spaces.  If  the  label  set  was,  in  fact,  a  subfield  of  R  then  it  would 
the  same. 


Definition  8:  Define  the  collection  JF*+  of  nonnegative, 
real-valued  functionals  g  defined  on  classification  system 
families  mapping  from  the  measurable  spaces  (£ .  £)  into  £ 
to  be 

jr*+  =  {e:jr->  r+}. 

Now,  suppose  the  performance  g( A)  is  determined  via  the 
ROC  curve  /a  (or  the  ROC  manifold),  that  is,  assume  there 
is  a  ROC  functional  p  that  maps  a  ROC  curve  (or  manifold) 
to  a  number  (see  [3]),  then 

e{  A)  =  p(/a)-  (5) 

Define  the  mapping  /  that  takes  a  CSF  A  and  outputs  its  ROC 
curve  /a-  Therefore,  equation  (5)  can  be  written  as 

g(A)  =  p(F(  A)). 

Definition  9:  (ROC  functional)  Define  the  collection  &*+ 
of  nonnegative,  real-valued  functionals  p  defined  on  ROC 
curves/manifolds  to  be 

y*+  =  {p  r+}. 

That  is,  given  a  ROC  manifold  A/a  and  a  ROC  functional 
p  then  p(Ma)  is  a  nonnegative  real  number.  Since  we  wish 
this  to  be  true  for  all  CSFs,  then  we  have 

g  —  P  of. 

This  equation  tells  us  that  given  a  ROC  functional  p  one  can 
make  the  performance  functional  g  just  by  the  composition 
with  F .  Therefore,  there  is  an  induced  mapping  /  *,  called 
the  conjugate  mapping  (that  is  conjugate  to  F)  defined  by 

g=  F*(p)  =po  F. 

Thus,  F *  :  y*+  — ►  y*+. 

Now  we  can  restate  problem  1  as 

Problem  2:  Given  a  ROC  functional  p,  and  given  a  set  of 
label  fusion  rules  LABrules  we  seek  the  best  fusion  rule  91*  £ 
LABrules  such  that  the  performance 

p(F  (9t*(A,B)))  >  p(F  (9t(A,B))) 

for  all  choices  91  £  LABRules.  That  is, 

p(F( 9t*(A,B)))=  max  p  (F  (9t(A,B)))  (6) 

OTeLABRules 

This  may  not  look  like  we  are  gaining  anything  new  until 
we  look  at  the  term  F(9t(A,B))  =  /<h(a,b)  (°r  -^sr(a.b))-  This 
is  the  ROC  manifold  of  the  fused  system. 

IV.  Results 

This  section  contains  the  main  result  that  concerns  deter¬ 
mining  a  ROC  functional  p  given  the  performance  functional 

e- 

Theorem  2:  Given  p  an  ^-functional  there  exists  an  unique 
g  ^-functional  given  by 

g  =  p  o  F- 

This  defines  the  conjugate  mapping  /  *  defined  by 
F*(p)=zpoF  for  all  p  £  . 
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The  domain  of  F*  is  all  of  8%*,  and  F*  :  — >  &*. 

Proof:  The  proof  of  this  theorem  is  straightforward.  ■ 
Theorem  3:  Given  ip  £  :Mk+  there  exists  an  unique  g  £ 
given  by  the  conjugate  mapping 

F*(<p)  =  p  °  F- 

The  domain  of  F*  is  all  of  £%*+ ,  and  F*  :  — >  J^*+. 

Proof:  Let  p  £  then  <p{M&)  >  0  for  every  ROC 
manifold,  M&  £  Then 

[F*(f)]  (A)  =  [p  o  F]  (A)  =  ip(F  (A))  =  ip  (Ma)  >  0 
for  every  Ae#.  Therefore, 

F*(<p)  £  ^*+- 


V.  Examples 

This  section  contains  well-known  examples  of  performance 
quantifiers  of  classifications  system  [7].  Let  A  denote  a 
classification  system  and  A  denote  a  family  of  classification 
systems. 


A.  case  n  =  2 

Assume  the  label  C  =  {f,  n}  for  this  subsection. 

Example  1:  True  Positive  (TP)  also  called  the  hit  rate, 
recall,  and  sensitivity 


Ptp{  A)  =  Ptp(  A) 

Example  2:  True  Negative  (TN)  also  called  the  correct 
rejection. 

Ptn(  A)  =  Ptn{  A) 

Example  3:  False  Positive  (FP)  also  called  false  alarm,  and 
Type  I  error 

Pfp{  A)  =  Pfp(  A). 

Example  4:  False  Negative  (FN)  also  called  the  Type  II 
error 

Pfn{  A)  =  PF  jv(A). 

Example  5:  Accuracy  (ACC) 

Pacc(A)  =  Ptp(A)  +  Ptn{  A) 

Example  6:  Specificity  (SPC) 


Pspc{A )  =  1  —  Pfp{  A) 

Example  7:  Positive  Predictive  Value  (PPV)  also  called 
precision 

f\\  Ptp(  A) 

PFPv(A>  ‘  PMK TTpmF 

Example  8:  Negative  Predictive  Value  (NPV) 


Pnpv{  A)  = 


Ptn{A) 


Ptn(A)  +  PFn(A) 

Example  9:  False  Discovery  Rate  (FDR) 


p  fdr{A)  = 


Ppp(A) 


Ppp(A)  +  PTp(A) 

Example  10:  Matthews  Correlation  Coefficient  (MCC) 
[8],  [9]  is  used  in  machine  learning  as  a  means  to  quantify  the 


2-class  classification  system  A.  For  brevity,  let  tp  =  Ptp{ A) 
,  tn  =  PTn( A),  fp  =  PFP( A)  and  fn  =  PFN( A)  then 

MCC  (A)  =  tptn  ~  fpfn 

If  any  of  the  four  sums  in  the  denominator  is  zero,  the 
denominator  can  be  arbitrarily  set  to  one;  this  results  in  a 
Matthews  Correlation  Coefficient  of  zero,  which  can  be  shown 
to  be  the  correct  limiting  value.  It  takes  into  account  true 
and  false  positives  and  negatives  and  is  generally  regarded 
as  a  balanced  quantifier  which  can  be  used  even  if  the  classes 
are  of  very  different  sizes.  It  returns  a  value  between  -1  and 
+  1.  A  coefficient  of  +1  represents  a  perfect  performance, 
0  an  average  random  performance  and  -1  the  worst  possible 
performance.  That  is,  it  quantifies  the  performance  of  the 
classification  system  A  Since  MCC  can  be  negative,  we  add 
1  to  get  a  nonnegative  performance  functional 

Pmcc{A)  =  MCC(  A)  +  1. 


By  the  disjunction  equations  we  see  that 


MCC  (A) 


1  ~  (fp  +  .fn) 

VW/p-Zn)2 


so  that 


1  _  (Zp  +  fn)  +  -  (fp  -  fn)2 

Pmcc(A)  =  - .  - . 

\A  -  (fp  -  fnf 

Example  11:  Let  g(£,,p)  be  a  non-negative  function  for 
every  (£,77)  £  [0,  l]2  then  consider  the  ^-functional 

Ps(A)  =  9(Pfp(A),PTp(A)) 

and  the  corresponding  ^-functional  is 


Qg(  A)  =  maxp(A)  =  maxg(PFP(A),PTp(A)). 

AeA  AeA 


All  the  previous  examples  have  a  well-defined  choice  of  a 
function  g. 

Example  12:  The  area  under  the  ROC  curve  is  NOT  a 
functional  defined  as  a  Riemann  integral 


Qauc( A)  =  /  fh(p)  dp. 


This  functional  is  NOT  a  ^-functional  since 


Qauc( A)  ^  max  p(A) 

AeA 

for  any  .^'-functional  p. 

B.  case  n  >  2 

Example  13:  Bayes  Cost  (BC)  Given  error  cost  values 
Cij  >  0,  that  is,  the  cost  to  make  the  i,j  error  (Cv;  =  0) 
then  the  Bayes  cost  for  the  system  A  is 

n  n 

Pbc(A)  =  EEcMp(^)%(A) 

*= 1 3= 1 

=  V  ©  C  ©  P(A) 
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when  matrix  V  =  v  <g)  l1  where 


References 


1  =  (1,1,. ..,1) 

then 

QbcW  =  rnax/jsc(A). 

AeA 

VI.  Conclusions 

There  is  a  large  collection  of  performance  functionals  to 
choose  from.  To  evaluate  an  ATR  system  one  should  consider 
the  performance  criteria  used.  It  might  come  down  to  analyz¬ 
ing  the  ROC  curve/manifold.  In  order  to  evaluate  the  fusion  of 
multiple  system  families,  one  needs  to  know  the  performance 
functional  used  (see  equation  (6).) 
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